A dynamic over-sampling procedure based on sensitivity for multi-class problems
نویسندگان
چکیده
Classification with imbalanced datasets supposes a new challenge for researches in the framework of machine learning. This problem appears when the number of patterns that represents one of the classes of the dataset (usually the concept of interest) is much lower than in the remaining classes. Thus, the learning model must be adapted to this situation, which is very common in real applications. In this paper, a dynamic over-sampling procedure is proposed for improving the classification of imbalanced datasets with more than two classes. This procedure is incorporated into a memetic algorithm (MA) that optimizes radial basis functions neural networks (RBFNNs). To handle class imbalance, the training data are resampled in two stages. In the first stage, an over-sampling procedure is applied to the minority class to balance in part the size of the classes. Then, the MA is run and the data are oversampled in different generations of the evolution, generating new patterns of the minimum sensitivity class (the class with the worst accuracy for the best RBFNN of the population). The methodology proposed is tested using 13 imbalanced benchmark classification datasets from well-known machine learning problems and one complex problem of microbial growth. It is compared to other neural network methods specifically designed for handling imbalanced data. These methods include different over-sampling procedures in the preprocessing stage, a threshold-moving method where the output threshold is moved toward inexpensive classes and ensembles approaches combining the models obtained with these techniques. The results show that our proposal is able to improve the sensitivity in the generalization set and obtains both a high accuracy level and a good classification level for
منابع مشابه
Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity
In this paper, a dynamic over-sampling procedure is proposed to improve the classification of imbalanced datasets with more than two classes. This procedure is incorporated into a Hybrid algorithm (HA) that optimizes Multi Layer Perceptron Neural Networks (MLPs). To handle class imbalance, the training dataset is resampled in two stages. In the first stage, an over-sampling procedure is applied...
متن کاملA multi-stage stochastic programming for condition-based maintenance with proportional hazards model
Condition-Based Maintenance (CBM) optimization using Proportional Hazards Model (PHM) is a kind of maintenance optimization problem in which inspections of a system relevant to its failure rate depending on the age and value of covariates are performed in time intervals. The general approach for constructing a CBM based on PHM for a system is to minimize a long run average cost per unit of time...
متن کاملA class of multi-agent discrete hybrid non linearizable systems: Optimal controller design based on quasi-Newton algorithm for a class of sign-undefinite hessian cost functions
In the present paper, a class of hybrid, nonlinear and non linearizable dynamic systems is considered. The noted dynamic system is generalized to a multi-agent configuration. The interaction of agents is presented based on graph theory and finally, an interaction tensor defines the multi-agent system in leader-follower consensus in order to design a desirable controller for the noted system. A...
متن کاملDynamic Cargo Trains Scheduling for Tackling Network Constraints and Costs Emanating from Tardiness and Earliness
This paper aims to develop a multi-objective model for scheduling cargo trains faced by the costs of tardiness and earliness, time limitations, queue priority and limited station lines. Based upon the Islamic Republic of Iran Railway Corporation (IRIRC) regulations, passenger trains enjoy priority over other trains for departure. Therefore, the timetable of cargo trains must be determined based...
متن کاملA heuristic method for consumable resource allocation in multi-class dynamic PERT networks
This investigation presents a heuristic method for consumable resource allocation problem in multi-class dynamic Project Evaluation and Review Technique (PERT) networks, where new projects from different classes (types) arrive to system according to independent Poisson processes with different arrival rates. Each activity of any project is operated at a devoted service station located in a n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition
دوره 44 شماره
صفحات -
تاریخ انتشار 2011